PATENT ABSTRACTS OF JAPAN 

(1 1 )Publication number : 09-322044 
(43)Date of publication of application : 12.12.1997 



(51)lnt.CI. 




H04N 5/225 
G10L 3/00 
H04N 7/15 
H04R 1/02 




(21)Application number : 


08-137820 


(71)Applicant 


: SHARP CORP 


(22)Date of filing: 


31.05.1996 


(72) Inventor : 


YAMAMOTO MAKOTO 



(54) TELEVISION CAMERA DEVICE 

(57)Abstract: 

PROBLEM TO BE SOLVED: To stop transmission or transmit holding 
music through a computer system or the like instead of conversation 
voice and secure privacy in the case that sound recording desired to 
be temporarily evaded is present by providing a shifting means for 
shifting to a control mode for not sending out voice gathered by a 
microphone to the outside of a television camera device by light- 
shielding the front surface of an image pickup lens for prescribed time. 



SOLUTION: The control part 1 1 of this television camera device 1a 
controls the whole body and a video image pickup part 12 picks up the 
video images of an object, converts them to television video signals 
and outputs them. A timing storage part 14 stores the set value of a 
timer operation and a voice recognition processing part 15 identifies a 
vocabulary determined beforehand from gathered voice and converts 
it to a control code. A voice processing part 16 performs a processing to voice signals like interrupting voice 
signals from the microphone 19 and sending out the holding music or the like instead thereof. 
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* NOTICES * 



Japan Patent Office is not responsible for any 
damages caused by the use of this translation. 

l.This document has been translated by computer. So the translation may not reflect the original precisely. 
2 **** s hows the word which can not be translated. 
3 .In the drawings, any words are not translated. 



DETAILED DESCRIPTION 



[Detailed Description of the Invention] 
[0001] 

[Field of the Invention] This invention relates to the television conference which transmits the image of the television 
camera equipment which picturizes an image to a remote place using means of communications, and the television 
camera equipment which can process into arbitration voice which used it for TV phone equipment and collected the 
sound especially about suitable television camera equipment. 
[0002] 

[Description of the Prior Art] The TV phone or video conference system which also used image information as a mean: 
to plan KOMYUNYUKESHON with those who are present in the conventionally distant location in addition to speech 
information is known. 

[0003] The technique perform with voice control action of the television camera equipment used for the above- 
mentioned TV phone or a video conference system is already known, and the example which separates control voice 
from the conversation which is keynote voice using two microphones apart from the technique performed by change 
actuation of a switch as a means to separate control voice from the conversation which is keynote voice is indicated by 
the publication-number No. 61497 [ five to ] official report. Moreover, the example which separates control voice from 
the conversation which is keynote voice by performing frequency analysis for the voice which collected the sound is 
indicated by the publication-number No. 173592 [ five to ] official report. 
[0004] 

[Problem(s) to be Solved by the Invention] However, although it will be necessary to protect one's privacy as a result o 
an opportunity to mix an image and voice and communicate with many and unspecified partners becoming possible 
with the advance of a communication network, [, such as an image, voice, etc. not to open to a partner, ] Control of the 
television camera equipment by the conventional keyboard, a mouse, etc. had complicated actuation, and its 
configuration of a device was very complicated in the technique of dividing control voice into the conversation going- 
up automatic target which is keynote voice. 

[0005] It is shifting to the control mode only in a motion of a hand, without being dependent on the class of computer 
apparatus which this invention is originated in view of such a situation, and is connected. While realizing easily 
sending-out cutoff of conversation voice not to open to a partner Since only recognition of the control voice of only the 
specific vocabulary defined beforehand, without being influenced in the control mode by the conversation which is 
keynote voice is performed, simplification of a speech recognition device can be attained and it is in offering the 
television camera equipment which can realize processing of the sending-out voice to a partner easily by the voice 
control. 
[0006] 

[Means for Solving the Problem] In order to solve the above-mentioned technical problem, it connects with the system 
which uses a communication line and transmits an image, and the next configuration is used for this invention in 
television camera equipment equipped with an image pick-up means to picturize an image, and a sound-collecting 
means to collect voice with a microphone. 

[0007] Namely, the television camera equipment concerning invention according to claim 1 By equipping a control 
section with the function to hold the hysteresis of the electrical signal level from the CCD image sensor always 
measured in order to perform white balance control for guaranteeing the color reproduction nature of a color image By 
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acquiring the information whose period when the electrical signal level from a CCD image sensor decreased rapidly 
from the hysteresis is predetermined that time amount continuation was carried out, it can know that the hand shaded 
the front face of an image pick-up lens. 

[0008] Moreover, the television camera equipment concerning invention according to claim 2 the speech recognition 
means of the speech information obtained with a sound-collecting means, a speech processing means to perform 
predetermined processing to the voice sent out in the exterior of television camera equipment, and a time check - by 
having a storage means the time amount progress while stopping sending out in the exterior of the control voice which 
collected the sound with the microphone in the control mode, after shifting to the control mode - a time check — 
termination of the control mode can be detected because it is in agreement with the setting information on a storage 
means, and television camera equipment can be automatically returned to normal mode actuation. 
[0009] And the television camera equipment concerning invention according to claim 3 comes to have the control 
means which controls actuation of television camera equipment by the speech recognition means of the control voice 
which consists of vocabularies of the specification defined beforehand which collected the sound with the microphone 
in a configuration according to claim 2. 

[0010] And since the display action according to the sending-out existence to the partner of the voice which came to 
have the display means which can display lighting or a putting-out-lights condition on arbitration by control [ control 
section ], therefore was collected with the microphone, the recognition propriety of control voice, and ****** of the 
control mode is performed, the television camera equipment concerning invention according to claim 4 can know in 
what kind of operating state television camera equipment is [ a user ] again. 
[0011] 

[Embodiment of the Invention] Hereafter, the gestalt of operation of the television camera equipment concerning this 
invention is explained to a detail based on a drawing. Drawing 1 is the block diagram showing the outline of a video 
conference system in which the television camera equipment of this invention was carried out. 
[0012] In drawing 1 , after the image and voice which were picturized with one television camera equipment la are 
transmitted to computer apparatus 2a connected by the video-signal line and the public telephone sound signal line, 
they are told to the display and the sound-reinforcement section of computer apparatus 2b in the location distant via the 
communication networks 3, such as a circuit. 

[0013] Similarly, after the image and voice which were picturized with the television camera equipment lb of another 
side are transmitted to computer apparatus 2b connected by the video-signal line and the sound signal line, they are tolc 
to the display and the sound-reinforcement section of computer apparatus 2a in the location distant via the 
communication networks 3, such as a dial-up line. 

[0014] Drawing 2 is the block diagram showing the whole television camera equipmenta [ 1 ] and lb configuration of 
drawing 1 , and drawing 3 is the perspective view showing the appearance of the television camera equipments la and 
lb of drawing 1 . 

[0015] These television camera equipments la and lb Picturize the image of the control section 1 1 which controls the 
whole equipment, and a photographic subject, and it changes into a television video signal, the time check which 
memorizes the set point of the image image pick-up section 12 to output and timer actuation — the sound signal from 
the storage section 14, the speech recognition processing section 15 which changes a specific vocabulary into a control 
sign from the inside of the collected voice, and a microphone being intercepted, or It has the speech processing section 
16 which processes to a sound signal, such as sending out tone-on-hold comfort instead. 

[0016] And while the lens 1 1 1 for an image pick-up is formed in the front face of the television camera equipments la 
and lb, by the microphone 19 which collects voice, and the control section 11, from the case 24, LEDI3 by which 
lighting control is carried out is horizontally [ the ] exposed, and is arranged. Moreover, the electric power switch 20 is 
arranged at the flank of television camera equipment. 

[0017] Moreover, the control section 1 1 has an internal timer and nonvolatile memory (all are illustration 
abbreviations), and while clocking always exact time of day because an internal timer carries out counting of the 
frequency of an oscillator circuit, the specific vocabulary which should be recognized as a control sign in the digital 
sign and the speech recognition processing section 15 of the note which can be recognized as music is beforehand 
memorized by changing into nonvolatile memory by the D/A conversion circuit of the speech processing section 16. 
[0018] Furthermore, the image pick-up section 12 Light The image of the CCD image sensor 113 and photographic 
subject which are changed into an electrical signal The lens 1 1 1 for making the front face of CCD image sensor 113 
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carry out image formation optically, a CCD image pick-up — the electrical signal from bamboo The synchronizing 
signal generator 115 synchronous-related [ for changing into the digital disposal circuit 1 12 and television video signal 
which are changed into the PAL television video signal constituted from an image of 50 sheet ** in the NTSC 
television video signal constituted from an image of 60 sheets in 1 second, or 1 second ], and a CCD image sensor It 
consists of the timing pulse generating circuits 1 16 and the perpendicular driver circuits 1 14 for driving. 
[0019] and the above-mentioned image pick-up section 12 — the image pick-up means of a claim — the image 
processing section 13 -- the image processing means of a claim - the speech recognition processing section 15 support 
to the speech recognition means of a claim, and a control section 1 1 and LEDI3 support [ the speech processing section 
16 ] the speech processing means of a claim at the display means of a claim, respectively. 
[0020] Next, actuation of the television camera equipments la and lb of the above-mentioned configuration is 
explained with reference to the state transition diagram shown in drawing 4 . For example, if the electric power switch 
section 20 of one television camera equipment la is operated and a power source is switched on, television camera 
equipment la will be in a normal operation condition, and the television video signal from the image image pick-up 
section 12 will be transmitted to KOMBYUTA equipment 2a from the image output terminal 17. 
[0021] After the volume of the conversation voice similarly collected with the microphone 19 is controlled in the 
speech processing section 16 based on the amplification degree set up by the control section 1 1, it is transmitted to 
KOMBYUTA equipment 2a from the voice output terminal 18. Moreover, LED13 which is the display device of 
television camera equipment la is in the condition of having always switched on the light, and it has reported to the use 
that television camera equipment is operating in normal operation mode. Therefore, while the image of a photographic 
subject is displayed on the display of computer apparatus 2a, it is in the condition that conversation voice is heard from 
the loudspeaker. 

[0022] When a reason to perform conversation which be not asked to the user by the side of computer apparatus 2b in 
this normal operation mode occurs, the user of television camera equipment la Without changing wrap habit for the 
speaker of the conventional telephone by hand, if wrap actuation is performed 1 second or more, the microphone 
section of television camera equipment la Since it approaches and the lens 1 1 1 currently installed is also covered by 
hand, an image input will be in a protection-from-light condition. In order that a control section 1 1 may acquire the 
hysteresis that the period when the electrical signal level from CCD image sensor 113 shifted to black level rapidly 
continued for 1 second, through a digital disposal circuit 1 12, a control section shifts to the control mode while 
performing cutoff directions in the speech processing section 16 immediately. 

[0023] In the control mode, for amplification degree 0, since the conversation voice collected with the microphone 19 
since voice cutoff directions had appeared in the speech processing section 16 from the control section 1 1 cannot be 
transmitted to computer apparatus 2a and transmitted from the voice output terminal 18 to computer apparatus 2b 
which is naturally a partner, either, it can secure privacy. Moreover, LED13 which is the display device of television 
camera equipment la is shifting to the condition of having always put out the light, and it has reported to the user that 
television camera equipment la is operating in control action mode. 

[0024] Since the sound-collecting actuation with a microphone 19 is continued also in the control mode and, as for this 
collected voice, speech recognition actuation is performed in the speech recognition processing section 15 in the 
condition of the control mode, When a user wants to change actuation of television camera equipment In order to avoid 
incorrect recognition, after uttering first the word "control", predetermined modification for every control word of the i 
performed to television camera equipment by uttering the control word which consists of a vocabulary which was 
defined beforehand continuously and set as the control section 11. 

[0025] the case where the speech recognition processing section 15 has recognized a user's control voice at this time — 
LED13 -- the period of 1 second -- a 2 times point ~ it is the configuration that carrying out decrease actuation reports z 
recognition situation to a user. Here, the incorrect recognition rate of the speech recognition processing section 15 is 
"small" lowered "greatly" by the "tone-on-hold comfort" shown in drawing 4 , silent [ "silent" ], although it is the 
control voice by which speech recognition is carried out in the control mode, and limiting to six kinds of "X second" 
and "a return", and positive recognition is enabled. 

[0026] The contents of control of the television camera equipment for every control word of this are explained below. 
First, when "tone-on-hold comfort" has been recognized as a control word, tone-on-hold comfort is generated because 
the control section 1 1 of television camera equipment changes the digital sign of the note beforehand stored in the 
nonvolatile memory with which the interior was equipped by the D/A conversion circuit of the speech processing 
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section 16, and the tone-on-hold comfort is always sent out from the voice output terminal 18 between the control 
modes. 

[0027] Moreover, when "it is silent" has been recognized as a control word, the control section 1 1 of television camera 
equipment is [ that no voice is sent out ] silent from the voice output terminal 18, in order to stop the sending out, wher 
tone-on-hold comfort is sent out from the speech processing section 16. Naturally, in the control mode, it is 
immediately after shift and, in the case of a silent sending-out condition, no change is produced. 
[0028] and -- as a control word -- "-- small - " - when it has recognized, the control section 1 1 of television camera 
equipment changes a setup so that it may become small to the conventional setup about the amplification degree at the 
time of amplifying the conversation voice which collected the sound with the microphone 19 after returning to normal 
operation mode in the speech processing section 16, and sending out from the voice output terminal 18. 
[0029] and ~ again ~ as a control word — "« large ~ " — when it has recognized, the control section 1 1 of television 
camera equipment changes a setup so that it may become large to the conventional setup about the amplification degree 
at the time of amplifying the conversation voice which collected the sound with the microphone 19 after returning to 
normal operation mode in the speech processing section 16, and sending out from the voice output terminal 18. 
[0030] furthermore, when "X second" has been recognized as a control word, the control section 1 1 of television 
camera equipment has memorized the setup time of ** which carries out a normal operation MODOHE auto return — a 
time check » when the setup time of the storage section 14 has been recognized to be 10 seconds as a control word, it i 
changed at 10 seconds. 

[003 1] the elapsed time when "the return" has been recognized as a control word, after the control section 1 1 of 
television camera equipment shifts further again from the normal operation mode clocked by the internal timer — a tim» 
check - even if it is in the condition of having not reached the setup time of the storage section 14, it shifts to LED 
flashing mode immediately. 

[0032] although explanation of the contents of control of the television camera equipment for every control word is 
above, in control action mode, the elapsed time after shifting from normal operation mode always clocks by the interna 
timer of a control section 1 1 - having - **** - a time check - it shifts to LED flashing mode because it is in 
agreement with the setup time of the storage section 14. 

[0033] While reporting that it is near for LED13 to perform question flashing actuation for 5 seconds in a cycle of 1 
seconds in LED flashing mode, and to shift to normal operation mode at a user Voice output cutoff is canceled for 5 
seconds after progress, and after the volume of the conversation voice collected with the microphone 19 is controlled ir 
the speech processing section 16 based on the amplification degree set up by the control section 1 1, it returns to the 
normal operation mode transmitted to KOMBYUTA equipment 2a from the voice output terminal 18. 
[0034] If it returns to normal operation mode, LEDI3 which is the display device of television camera equipment la 
will be always switching on the light, and it will report to a user that television camera equipment is operating in 
normal operation mode. Therefore, while the image of a photographic subject is displayed on the display of computer 
apparatus 2a, it returns to the condition that conversation voice is also heard from the loudspeaker. 
[0035] Henceforth, although the return actuation to the control mode and normal operation mode is repeated whenever 
a lens 1 1 1 is shaded more than for 1 second, when an electric power switch 20 is again switched on since the 
energization to television camera equipment is intercepted and a control section is reset even if it intercepts an electric 
power switch 20 in the state of the control mode with a natural thing, it will operate in normal operation mode. 
[0036] 

[Effect of the Invention] Since the television camera equipment of this invention is the above configurations, since 
invention according to claim 1 changes a halt or tone-on-hold comfort of transmission into conversation voice and can 
transmit it through a computer apparatus etc. to avoid temporarily that the situation which be not known as a partner 
arises and the present conversation situation is collected with television camera equipment, it can secure privacy. 
[0037] Moreover, invention according to claim 2 can offer the control means of the operating state of television earner? 
equipment easily by establishing the control mode which performs return actuation automatically easily by easy 
actuation of shading the lens section of television camera equipment that it can shift. 

[0038] And in the case of a control vocabulary, the system which performs control action of television camera **** 
with voice can consist of easily performing speech recognition of the control voice which should limit invention 
according to claim 3 only to the voice collected in the control mode, and should be recognized as a control sign, such a; 
prefacing language by means of which conversation voice and discernment become easy, without being influenced by 
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conversation voice. 

[0039] And the check of whether the voice collected with the microphone is sent out to the partner is not only easy, but 
invention according to claim 4 can enable the check of the return to the normal mode being as Hasama from the 
recognition situation of the control voice in the control mode, and the control mode easily visually again. 



[Translation done.] 
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